Extend LLVM 18 workaround to other float types. #3016

maleadt · 2026-01-12T18:12:35Z

Extends #2937, fixes #2946

giordano · 2026-01-12T18:16:28Z

src/device/intrinsics/math.jl

+        @eval begin
+            @device_override @inline Base.FastMath.max_fast(x::$T, y::$T) = ifelse(y > x, y, x)
+            @device_override @inline Base.FastMath.min_fast(x::$T, y::$T) = ifelse(y > x, x, y)
+            @device_override @inline Base.FastMath.minmax_fast(x::$T, y::$T) = ifelse(y > x, (x, y), (y, x))
+        end


Can you do something like

@device_override @inline Base.FastMath.max_fast(x::$T, y::$T) where {T<:Union{Float16, Float32, Float64}} = ifelse(y > x, y, x)

just to avoid the loop
?

I'm always wary for doing so, because the Base method may then end up being more specific (and we really want these to apply). In this case, Base doesn't use metaprogramming so I guess it could work..

codecov · 2026-01-12T21:19:04Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.31%. Comparing base (da38676) to head (97f7c1e).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #3016   +/-   ##
=======================================
  Coverage   89.31%   89.31%           
=======================================
  Files         148      148           
  Lines       12995    12995           
=======================================
  Hits        11606    11606           
  Misses       1389     1389

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions

CUDA.jl Benchmarks

Details

Benchmark suite	Current: `97f7c1e`	Previous: `da38676`	Ratio
`latency/precompile`	`55584314156` ns	`55180480419` ns	`1.01`
`latency/ttfp`	`7717855921.5` ns	`7807018821.5` ns	`0.99`
`latency/import`	`4025438307` ns	`4140006046.5` ns	`0.97`
`integration/volumerhs`	`9623132` ns	`9623640.5` ns	`1.00`
`integration/byval/slices=1`	`146664` ns	`147119` ns	`1.00`
`integration/byval/slices=3`	`425554` ns	`426080.5` ns	`1.00`
`integration/byval/reference`	`144978` ns	`145207` ns	`1.00`
`integration/byval/slices=2`	`286179` ns	`286491` ns	`1.00`
`integration/cudadevrt`	`103398` ns	`103866` ns	`1.00`
`kernel/indexing`	`14096` ns	`14460` ns	`0.97`
`kernel/indexing_checked`	`14790` ns	`15250` ns	`0.97`
`kernel/occupancy`	`679.2064516129033` ns	`680.6470588235294` ns	`1.00`
`kernel/launch`	`2083.2` ns	`2197.4444444444443` ns	`0.95`
`kernel/rand`	`15013` ns	`18784` ns	`0.80`
`array/reverse/1d`	`19771` ns	`20177` ns	`0.98`
`array/reverse/2dL_inplace`	`66721` ns	`67023` ns	`1.00`
`array/reverse/1dL`	`69914` ns	`70421` ns	`0.99`
`array/reverse/2d`	`21925` ns	`22600` ns	`0.97`
`array/reverse/1d_inplace`	`11577` ns	`10092` ns	`1.15`
`array/reverse/2d_inplace`	`13285` ns	`13669` ns	`0.97`
`array/reverse/2dL`	`74024` ns	`74807` ns	`0.99`
`array/reverse/1dL_inplace`	`66907` ns	`67197` ns	`1.00`
`array/copy`	`20556` ns	`20909` ns	`0.98`
`array/iteration/findall/int`	`157600.5` ns	`159075` ns	`0.99`
`array/iteration/findall/bool`	`139621` ns	`140579` ns	`0.99`
`array/iteration/findfirst/int`	`160826` ns	`161050` ns	`1.00`
`array/iteration/findfirst/bool`	`161498` ns	`162329.5` ns	`0.99`
`array/iteration/scalar`	`73528` ns	`74588` ns	`0.99`
`array/iteration/logical`	`213816` ns	`216756.5` ns	`0.99`
`array/iteration/findmin/1d`	`91481` ns	`96345.5` ns	`0.95`
`array/iteration/findmin/2d`	`121664` ns	`122694` ns	`0.99`
`array/reductions/reduce/Int64/1d`	`42940` ns	`43621` ns	`0.98`
`array/reductions/reduce/Int64/dims=1`	`50581` ns	`44622.5` ns	`1.13`
`array/reductions/reduce/Int64/dims=2`	`61611` ns	`61757` ns	`1.00`
`array/reductions/reduce/Int64/dims=1L`	`89000` ns	`89064` ns	`1.00`
`array/reductions/reduce/Int64/dims=2L`	`88050` ns	`88156` ns	`1.00`
`array/reductions/reduce/Float32/1d`	`36523.5` ns	`38304` ns	`0.95`
`array/reductions/reduce/Float32/dims=1`	`42444` ns	`42098.5` ns	`1.01`
`array/reductions/reduce/Float32/dims=2`	`59795` ns	`60277` ns	`0.99`
`array/reductions/reduce/Float32/dims=1L`	`52444` ns	`52722` ns	`0.99`
`array/reductions/reduce/Float32/dims=2L`	`71866` ns	`72438` ns	`0.99`
`array/reductions/mapreduce/Int64/1d`	`43028` ns	`43667` ns	`0.99`
`array/reductions/mapreduce/Int64/dims=1`	`44303` ns	`45026` ns	`0.98`
`array/reductions/mapreduce/Int64/dims=2`	`61445` ns	`62167` ns	`0.99`
`array/reductions/mapreduce/Int64/dims=1L`	`89047` ns	`89081` ns	`1.00`
`array/reductions/mapreduce/Int64/dims=2L`	`87991` ns	`88471` ns	`0.99`
`array/reductions/mapreduce/Float32/1d`	`36026` ns	`38399` ns	`0.94`
`array/reductions/mapreduce/Float32/dims=1`	`51970` ns	`41819.5` ns	`1.24`
`array/reductions/mapreduce/Float32/dims=2`	`59669` ns	`60230.5` ns	`0.99`
`array/reductions/mapreduce/Float32/dims=1L`	`52620` ns	`52828` ns	`1.00`
`array/reductions/mapreduce/Float32/dims=2L`	`72070` ns	`72045.5` ns	`1.00`
`array/broadcast`	`19836` ns	`20443` ns	`0.97`
`array/copyto!/gpu_to_gpu`	`12834` ns	`13022` ns	`0.99`
`array/copyto!/cpu_to_gpu`	`214812` ns	`216732` ns	`0.99`
`array/copyto!/gpu_to_cpu`	`287929` ns	`283462` ns	`1.02`
`array/accumulate/Int64/1d`	`124865` ns	`124912` ns	`1.00`
`array/accumulate/Int64/dims=1`	`83675` ns	`84105` ns	`0.99`
`array/accumulate/Int64/dims=2`	`158830` ns	`158348` ns	`1.00`
`array/accumulate/Int64/dims=1L`	`1710434` ns	`1710807` ns	`1.00`
`array/accumulate/Int64/dims=2L`	`966856.5` ns	`966629` ns	`1.00`
`array/accumulate/Float32/1d`	`108995` ns	`109358` ns	`1.00`
`array/accumulate/Float32/dims=1`	`80096` ns	`80805` ns	`0.99`
`array/accumulate/Float32/dims=2`	`147591` ns	`148060.5` ns	`1.00`
`array/accumulate/Float32/dims=1L`	`1619034` ns	`1619572.5` ns	`1.00`
`array/accumulate/Float32/dims=2L`	`697841` ns	`698871` ns	`1.00`
`array/construct`	`1287.4` ns	`1243.1` ns	`1.04`
`array/random/randn/Float32`	`46521.5` ns	`48790` ns	`0.95`
`array/random/randn!/Float32`	`24942` ns	`25388` ns	`0.98`
`array/random/rand!/Int64`	`27269` ns	`27419` ns	`0.99`
`array/random/rand!/Float32`	`8736.333333333334` ns	`9072.666666666666` ns	`0.96`
`array/random/rand/Int64`	`29927` ns	`29902` ns	`1.00`
`array/random/rand/Float32`	`13355` ns	`13324` ns	`1.00`
`array/permutedims/4d`	`54940` ns	`57303.5` ns	`0.96`
`array/permutedims/2d`	`54110` ns	`54025.5` ns	`1.00`
`array/permutedims/3d`	`54939.5` ns	`55054.5` ns	`1.00`
`array/sorting/1d`	`2758522` ns	`2759246` ns	`1.00`
`array/sorting/by`	`3345836` ns	`3345927` ns	`1.00`
`array/sorting/2d`	`1080524` ns	`1082310` ns	`1.00`
`cuda/synchronization/stream/auto`	`1041.9` ns	`1045.9` ns	`1.00`
`cuda/synchronization/stream/nonblocking`	`7703.5` ns	`7338.6` ns	`1.05`
`cuda/synchronization/stream/blocking`	`836.9480519480519` ns	`812.9347826086956` ns	`1.03`
`cuda/synchronization/context/auto`	`1193.2` ns	`1170.7` ns	`1.02`
`cuda/synchronization/context/nonblocking`	`7080.9` ns	`7720.5` ns	`0.92`
`cuda/synchronization/context/blocking`	`933.8857142857142` ns	`910.6818181818181` ns	`1.03`

This comment was automatically generated by workflow using github-action-benchmark.

Extend LLVM 18 workaround to other float types.

97f7c1e

maleadt added the bugfix This gets something working again. label Jan 12, 2026

maleadt mentioned this pull request Jan 12, 2026

PTX compile error: ".NaN requires .target sm_80 or higher" on Julia 1.12 (RTX 2080 / sm_75, works fine on Julia 1.11.7) #2946

Closed

giordano reviewed Jan 12, 2026

View reviewed changes

github-actions bot reviewed Jan 12, 2026

View reviewed changes

maleadt merged commit aa310ac into master Jan 13, 2026
3 checks passed

maleadt deleted the tb/llvm18 branch January 13, 2026 06:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extend LLVM 18 workaround to other float types. #3016

Extend LLVM 18 workaround to other float types. #3016

maleadt commented Jan 12, 2026

Uh oh!

giordano Jan 12, 2026

Uh oh!

maleadt Jan 12, 2026

Uh oh!

codecov bot commented Jan 12, 2026 •

edited

Loading

Uh oh!

github-actions bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Extend LLVM 18 workaround to other float types. #3016

Extend LLVM 18 workaround to other float types. #3016

Conversation

maleadt commented Jan 12, 2026

Uh oh!

giordano Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

maleadt Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

CUDA.jl Benchmarks

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Jan 12, 2026 •

edited

Loading